.\" Copyright (C) 2019 Jens Axboe <axboe@kernel.dk>
.\" Copyright (C) 2019 Red Hat, Inc.
.\"
.\" SPDX-License-Identifier: LGPL-2.0-or-later
.\"
.TH IO_URING_REGISTER 2 2019-01-17 "Linux" "Linux Programmer's Manual"
.SH NAME
io_uring_register \- register files or user buffers for asynchronous I/O 
.SH SYNOPSIS
.nf
.BR "#include <linux/io_uring.h>"
.PP
.BI "int io_uring_register(unsigned int " fd ", unsigned int " opcode ,
.BI "                      void *" arg ", unsigned int " nr_args );
.fi
.PP
.SH DESCRIPTION
.PP

The
.BR io_uring_register ()
system call registers resources (e.g. user buffers, files, eventfd,
personality, restrictions) for use in an
.BR io_uring (7)
instance referenced by
.IR fd .
Registering files or user buffers allows the kernel to take long term
references to internal data structures or create long term mappings of
application memory, greatly reducing per-I/O overhead.

.I fd
is the file descriptor returned by a call to
.BR io_uring_setup (2).
.I opcode
can be one of:

.TP
.B IORING_REGISTER_BUFFERS
.I arg
points to a
.I struct iovec
array of
.I nr_args
entries.  The buffers associated with the iovecs will be locked in
memory and charged against the user's
.B RLIMIT_MEMLOCK
resource limit.  See
.BR getrlimit (2)
for more information.  Additionally, there is a size limit of 1GiB per
buffer.  Currently, the buffers must be anonymous, non-file-backed
memory, such as that returned by
.BR malloc (3)
or
.BR mmap (2)
with the
.B MAP_ANONYMOUS
flag set.  It is expected that this limitation will be lifted in the
future. Huge pages are supported as well. Note that the entire huge
page will be pinned in the kernel, even if only a portion of it is
used.

After a successful call, the supplied buffers are mapped into the
kernel and eligible for I/O.  To make use of them, the application
must specify the
.B IORING_OP_READ_FIXED
or
.B IORING_OP_WRITE_FIXED
opcodes in the submission queue entry (see the
.I struct io_uring_sqe
definition in
.BR io_uring_enter (2)),
and set the
.I buf_index
field to the desired buffer index.  The memory range described by the
submission queue entry's
.I addr
and
.I len
fields must fall within the indexed buffer.

It is perfectly valid to setup a large buffer and then only use part
of it for an I/O, as long as the range is within the originally mapped
region.

An application can increase or decrease the size or number of
registered buffers by first unregistering the existing buffers, and
then issuing a new call to
.BR io_uring_register ()
with the new buffers.

Note that registering buffers will wait for the ring to idle. If the application
currently has requests in-flight, the registration will wait for those to
finish before proceeding.

An application need not unregister buffers explicitly before shutting
down the io_uring instance. Available since 5.1.

.TP
.B IORING_UNREGISTER_BUFFERS
This operation takes no argument, and
.I arg
must be passed as NULL.  All previously registered buffers associated
with the io_uring instance will be released. Available since 5.1.

.TP
.B IORING_REGISTER_FILES
Register files for I/O.
.I arg
contains a pointer to an array of
.I nr_args
file descriptors (signed 32 bit integers).

To make use of the registered files, the
.B IOSQE_FIXED_FILE
flag must be set in the
.I flags
member of the
.IR "struct io_uring_sqe" ,
and the
.I fd
member is set to the index of the file in the file descriptor array.

The file set may be sparse, meaning that the
.B fd
field in the array may be set to
.B -1.
See
.B IORING_REGISTER_FILES_UPDATE
for how to update files in place.

Note that registering files will wait for the ring to idle. If the application
currently has requests in-flight, the registration will wait for those to
finish before proceeding. See
.B IORING_REGISTER_FILES_UPDATE
for how to update an existing set without that limitation.

Files are automatically unregistered when the io_uring instance is
torn down. An application need only unregister if it wishes to
register a new set of fds. Available since 5.1.

.TP
.B IORING_REGISTER_FILES_UPDATE
This operation replaces existing files in the registered file set with new
ones, either turning a sparse entry (one where fd is equal to -1) into a
real one, removing an existing entry (new one is set to -1), or replacing
an existing entry with a new existing entry.

.I arg
must contain a pointer to a struct io_uring_files_update, which contains
an offset on which to start the update, and an array of file descriptors to
use for the update.
.I nr_args
must contain the number of descriptors in the passed in array. Available
since 5.5.

File descriptors can be skipped if they are set to
.B IORING_REGISTER_FILES_SKIP.
Skipping an fd will not touch the file associated with the previous
fd at that index. Available since 5.12.


.TP
.B IORING_UNREGISTER_FILES
This operation requires no argument, and
.I arg
must be passed as NULL.  All previously registered files associated
with the io_uring instance will be unregistered. Available since 5.1.

.TP
.B IORING_REGISTER_EVENTFD
It's possible to use eventfd(2) to get notified of completion events on an
io_uring instance. If this is desired, an eventfd file descriptor can be
registered through this operation.
.I arg
must contain a pointer to the eventfd file descriptor, and
.I nr_args
must be 1. Available since 5.2.

An application can temporarily disable notifications, coming through the
registered eventfd, by setting the
.B IORING_CQ_EVENTFD_DISABLED
bit in the
.I flags
field of the CQ ring.
Available since 5.8.

.TP
.B IORING_REGISTER_EVENTFD_ASYNC
This works just like
.B IORING_REGISTER_EVENTFD
, except notifications are only posted for events that complete in an async
manner. This means that events that complete inline while being submitted
do not trigger a notification event. The arguments supplied are the same as
for
.B IORING_REGISTER_EVENTFD.
Available since 5.6.

.TP
.B IORING_UNREGISTER_EVENTFD
Unregister an eventfd file descriptor to stop notifications. Since only one
eventfd descriptor is currently supported, this operation takes no argument,
and
.I arg
must be passed as NULL and
.I nr_args
must be zero. Available since 5.2.

.TP
.B IORING_REGISTER_PROBE
This operation returns a structure, io_uring_probe, which contains information
about the opcodes supported by io_uring on the running kernel.
.I arg
must contain a pointer to a struct io_uring_probe, and
.I nr_args
must contain the size of the ops array in that probe struct. The ops array
is of the type io_uring_probe_op, which holds the value of the opcode and
a flags field. If the flags field has
.B IO_URING_OP_SUPPORTED
set, then this opcode is supported on the running kernel. Available since 5.6.

.TP
.B IORING_REGISTER_PERSONALITY
This operation registers credentials of the running application with io_uring,
and returns an id associated with these credentials. Applications wishing to
share a ring between separate users/processes can pass in this credential id
in the sqe
.B personality
field. If set, that particular sqe will be issued with these credentials. Must
be invoked with
.I arg
set to NULL and
.I nr_args
set to zero. Available since 5.6.

.TP
.B IORING_UNREGISTER_PERSONALITY
This operation unregisters a previously registered personality with io_uring.
.I nr_args
must be set to the id in question, and
.I arg
must be set to NULL. Available since 5.6.

.TP
.B IORING_REGISTER_ENABLE_RINGS
This operation enables an io_uring ring started in a disabled state
.RB (IORING_SETUP_R_DISABLED
was specified in the call to
.BR io_uring_setup (2)).
While the io_uring ring is disabled, submissions are not allowed and
registrations are not restricted.

After the execution of this operation, the io_uring ring is enabled:
submissions and registration are allowed, but they will
be validated following the registered restrictions (if any).
This operation takes no argument, must be invoked with
.I arg
set to NULL and
.I nr_args
set to zero. Available since 5.10.

.TP
.B IORING_REGISTER_RESTRICTIONS
.I arg
points to a
.I struct io_uring_restriction
array of
.I nr_args
entries.

With an entry it is possible to allow an
.BR io_uring_register ()
.I opcode,
or specify which
.I opcode
and
.I flags
of the submission queue entry are allowed,
or require certain
.I flags
to be specified (these flags must be set on each submission queue entry).

All the restrictions must be submitted with a single
.BR io_uring_register ()
call and they are handled as an allowlist (opcodes and flags not registered,
are not allowed).

Restrictions can be registered only if the io_uring ring started in a disabled
state
.RB (IORING_SETUP_R_DISABLED
must be specified in the call to
.BR io_uring_setup (2)).

Available since 5.10.

.SH RETURN VALUE

On success,
.BR io_uring_register ()
returns 0.  On error, -1 is returned, and
.I errno
is set accordingly.

.SH ERRORS
.TP
.B EACCES
The
.I opcode
field is not allowed due to registered restrictions.
.TP
.B EBADF
One or more fds in the
.I fd
array are invalid.
.TP
.B EBADFD
.B IORING_REGISTER_ENABLE_RINGS
or
.B IORING_REGISTER_RESTRICTIONS
was specified, but the io_uring ring is not disabled.
.TP
.B EBUSY
.B IORING_REGISTER_BUFFERS
or
.B IORING_REGISTER_FILES
or
.B IORING_REGISTER_RESTRICTIONS
was specified, but there were already buffers, files, or restrictions
registered.
.TP
.B EFAULT
buffer is outside of the process' accessible address space, or
.I iov_len
is greater than 1GiB.
.TP
.B EINVAL
.B IORING_REGISTER_BUFFERS
or
.B IORING_REGISTER_FILES
was specified, but
.I nr_args
is 0.
.TP
.B EINVAL
.B IORING_REGISTER_BUFFERS
was specified, but
.I nr_args
exceeds
.B UIO_MAXIOV
.TP
.B EINVAL
.B IORING_UNREGISTER_BUFFERS
or
.B IORING_UNREGISTER_FILES
was specified, and
.I nr_args
is non-zero or
.I arg
is non-NULL.
.TP
.B EINVAL
.B IORING_REGISTER_RESTRICTIONS
was specified, but
.I nr_args
exceeds the maximum allowed number of restrictions or restriction
.I opcode
is invalid.
.TP
.B EMFILE
.B IORING_REGISTER_FILES
was specified and
.I nr_args
exceeds the maximum allowed number of files in a fixed file set.
.TP
.B EMFILE
.B IORING_REGISTER_FILES
was specified and adding
.I nr_args
file references would exceed the maximum allowed number of files the user
is allowed to have according to the
.B
RLIMIT_NOFILE
resource limit and the caller does not have
.B CAP_SYS_RESOURCE
capability. Note that this is a per user limit, not per process.
.TP
.B ENOMEM
Insufficient kernel resources are available, or the caller had a
non-zero
.B RLIMIT_MEMLOCK
soft resource limit, but tried to lock more memory than the limit
permitted.  This limit is not enforced if the process is privileged
.RB ( CAP_IPC_LOCK ).
.TP
.B ENXIO
.B IORING_UNREGISTER_BUFFERS
or
.B IORING_UNREGISTER_FILES
was specified, but there were no buffers or files registered.
.TP
.B ENXIO
Attempt to register files or buffers on an io_uring instance that is already
undergoing file or buffer registration, or is being torn down.
.TP
.B EOPNOTSUPP
User buffers point to file-backed memory.
